Search CORE

154 research outputs found

Iterative Policy-Space Expansion in Reinforcement Learning

Author: Lichtenberg Jan
Şimşek Özgür
Publication venue
Publication date: 01/01/2019
Field of study

Humans and animals solve a difficult problem much more easily when they are presented with a sequence of problems that starts simple and slowly increases in difficulty. We explore this idea in the context of reinforcement learning. Rather than providing the agent with an externally provided curriculum of progressively more difficult tasks, the agent solves a single task utilizing a decreasingly constrained policy space. The algorithm we propose first learns to categorize features into positive and negative before gradually learning a more refined policy. Experimental results in Tetris demonstrate superior learning rate of our approach when compared to existing algorithms.This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 66599

OPUS

Regularization in Directable Environments with Application to Tetris

Author: Lichtenberg Jan
Şimşek Özgür
Publication venue
Publication date: 15/06/2019
Field of study

Learning from small data sets is difficult in the absence of specific domain knowledge. We present a regularized linear model called STEW that benefits from a generic and prevalent form of prior knowledge: feature directions. STEW shrinks weights toward each other, converging to an equal-weights solution in the limit of infinite regularization. We provide theoretical results on the equal-weights solution that explains how STEW can productively trade-off bias and variance. Across a wide range of learning problems, including Tetris, STEW outperformed existing linear models, including ridge regression, the Lasso, and the non-negative Lasso, when feature directions were known. The model proved to be robust to unreliable (or absent) feature directions, still outperforming alternative models under diverse conditions. Our results in Tetris were obtained by using a novel approach to learning in sequential decision environments based on multinomial logistic regression. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 66599

OPUS

Iterative Policy-Space Expansion in Reinforcement Learning

Author: Lichtenberg Jan
Şimşek Özgür
Publication venue
Publication date: 01/01/2019
Field of study

Humans and animals solve a difficult problem much more easily when they are presented with a sequence of problems that starts simple and slowly increases in difficulty. We explore this idea in the context of reinforcement learning. Rather than providing the agent with an externally provided curriculum of progressively more difficult tasks, the agent solves a single task utilizing a decreasingly constrained policy space. The algorithm we propose first learns to categorize features into positive and negative before gradually learning a more refined policy. Experimental results in Tetris demonstrate superior learning rate of our approach when compared to existing algorithms.Comment: Workshop on Biological and Artificial Reinforcement Learning at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canad

arXiv.org e-Print Archive

OPUS

Regularization in Directable Environments with Application to Tetris

Author: Lichtenberg Jan
Şimşek Özgür
Publication venue
Publication date: 15/06/2019
Field of study

OPUS

Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study

Author: Ramanauskas Karolis
Şimşek Özgür
Publication venue
Publication date: 05/12/2023
Field of study

We explore colour versus shape goal misgeneralization originally demonstrated by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an ambiguous choice, the agents seem to prefer generalization based on colour rather than shape. After training over 1,000 agents in a simplified version of the environment and evaluating them on over 10 million episodes, we conclude that the behaviour can be attributed to the agents learning to detect the goal object through a specific colour channel. This choice is arbitrary. Additionally, we show how, due to underspecification, the preferences can change when retraining the agents using exactly the same procedure except for using a different random seed for the training run. Finally, we demonstrate the existence of outliers in out-of-distribution behaviour based on training random seed alone.Comment: ATTRIB: Workshop on Attributing Model Behavior at Scale at NeurIPS 202

arXiv.org e-Print Archive

Betweenness Centrality as a Basis for Forming Skills

Author: Barto Andrew G.
Şimşek Özgür
Publication venue
Publication date: 12/04/2007
Field of study

We show that betweenness centrality, a graph-theoretic measure widely used in social network analysis, provides a sound basis for autonomously forming useful high-level behaviors, or skills, from available primitives— the smallest behavioral units available to an autonomous agent

OPUS

Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning

Author: Barto Andrew G.
Şimşek Özgür
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2004
Field of study

We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks

CiteSeerX

Crossref

ScholarWorks@UMass Amherst

The Temporal Persistence of Generative Language Models in Sentiment Analysis

Author: Medina-Alias Pablo
Şimşek Özgür
Publication venue
Publication date: 21/09/2023
Field of study

OPUS

Creating Multi-Level Skill Hierarchies in Reinforcement Learning

Author: Evans Joshua B.
Şimşek Özgür
Publication venue
Publication date: 16/01/2024
Field of study

What is a useful skill hierarchy for an autonomous agent? We propose an answer based on a graphical representation of how the interaction between an agent and its environment may unfold. Our approach uses modularity maximisation as a central organising principle to expose the structure of the interaction graph at multiple levels of abstraction. The result is a collection of skills that operate at varying time scales, organised into a hierarchy, where skills that operate over longer time scales are composed of skills that operate over shorter time scales. The entire skill hierarchy is generated automatically, with no human intervention, including the skills themselves (their behaviour, when they can be called, and when they terminate) as well as the hierarchical dependency structure between them. In a wide range of environments, this approach generates skill hierarchies that are intuitively appealing and that considerably improve the learning performance of the agent

OPUS